CASSANALYTICS-151: Cannot read start offset from BTI with big partitions#199
CASSANALYTICS-151: Cannot read start offset from BTI with big partitions#199lukasz-antoniak wants to merge 2 commits intoapache:trunkfrom
Conversation
michaelsembwever
left a comment
There was a problem hiding this comment.
+1 tested, fixes the issue
| try | ||
| { | ||
| withPartitionIndex(ssTable, descriptor, metadata, true, false, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> { | ||
| withPartitionIndex(ssTable, descriptor, metadata, true, true, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> { |
There was a problem hiding this comment.
For simplicity, I think loadDataFile and loadRowsIndex are always true. Can we simplify the method signature?
In Cassandra's implementation, row index component is always opened.
There was a problem hiding this comment.
Good catch, thank you. Code updated.
| { | ||
| LOGGER.error("Missing key key={} token={} partitioner={}", | ||
| key, | ||
| toToken(partitioner, index), |
There was a problem hiding this comment.
It should be toToken(partitioner, i), since i is the partition index. Please rename i too.
Meanwhile, the variable index can be removed.
| { | ||
| LOGGER.error("Key read by more than 1 Spark partition key={} token={} partitioner={}", | ||
| key, | ||
| toToken(partitioner, index), |
There was a problem hiding this comment.
Same, it should be toToken(partitioner, i)
| partitioner.name()); | ||
| } | ||
| else if (count > 1) | ||
| for (int j = 0; j < counts[i].length; j++) |
There was a problem hiding this comment.
nit: rename j to rowIndexInPartition?
| MutableInt skippedPartitions = new MutableInt(0); | ||
| MutableLong skippedDataOffsets = new MutableLong(0); | ||
| int[] counts = new int[numKeys]; | ||
| int[][] counts = new int[numPartitions][numRowsPerPartition]; |
There was a problem hiding this comment.
nit: rename counts to partitions
There was a problem hiding this comment.
I think that this two-dimension array stores really the count of views for each partition and row, so maybe we should leave it as counts?
There was a problem hiding this comment.
Is the change in this file (for 4.0) necessary? The bug fixed is only in the BTI format code path.
There was a problem hiding this comment.
My motivation was to keep both classes in-sync unless changes cannot be applied. I have rolled back the change in four-zero bridge now.
Fixes CASSANALYTICS-151.